relational API
Back to DuckDB Data Engineering Glossary
The relational API in DuckDB provides a fluent, Pythonic interface for constructing and executing SQL queries programmatically. It allows data analysts and engineers to build complex queries step-by-step using method chaining, without writing raw SQL strings. This API is particularly useful when working with DuckDB in Python environments, such as Jupyter notebooks or data processing scripts.
The relational API exposes methods like filter()
, select()
, aggregate()
, and order()
that correspond to SQL clauses. For example, instead of writing:
Copy code
SELECT name, age FROM users WHERE age > 18 ORDER BY name
You can use the relational API like this:
Copy code
result = duckdb.sql("FROM users").filter("age > 18").select("name, age").order("name")
This approach offers several advantages:
- It's more readable and maintainable for complex queries.
- It allows for dynamic query construction based on runtime conditions.
- It integrates seamlessly with Python's control flow and data structures.
The relational API also provides methods for executing queries and retrieving results in various formats, such as pandas DataFrames or Arrow tables, making it a powerful tool for data manipulation and analysis in Python-based workflows.
For more information and detailed usage examples, refer to the DuckDB Python API documentation.